Comprehensive Data Lake For Both Structured And Unstructured Data

Sturdy Statistics combines all of our automatically-imputed thematic structure, classification predictions, search rankings, and any structured data you already have into one unified data lake. This enables you to query all of these results, in one place, with a unified SQL syntax.

With Sturdy Stats, you can put your unstructured data on exactly equal footing with your structured data, and even analyze it with the same set of tools!

Let’s say we want to build a RAG application on top of our dataset of tech earnings calls, and we want to know how Nvidia and Alphabet are each discussing Automotive AI. With one SQL statement we can combine a metadata filter on the company name and a thematic index on the document content. We can then retrieve the top-matching paragraphs and send them to an LLM for summarization. This is what we find:

(topic="Automotive AI and Simulation") AND (ticker="GOOG")
This theme centers on the strategic integration and advancement of Google’s generative AI and cloud services within various sectors and partnerships. Google is enhancing its product leadership through collaborative efforts, particularly in sectors like automotive, e-commerce, and education, by leveraging technologies such as Bard and Gemini to improve user experiences, efficiency, and innovation. Major initiatives highlighted include partnerships with companies like Porsche and Mercedes-Benz to enhance in-vehicle digital experiences, as well as the utilization of AI tools in Workspace and data analytics to optimize operations and customer engagement.
(topic="Automotive AI and Simulation") AND (ticker="NVDA")
This theme centers around NVIDIA’s significant advancements and strategic positioning in the realms of artificial intelligence, automotive technology, and the development of its Omniverse platform. NVIDIA has reported robust revenue growth, particularly in its automotive sector, driven by AI automotive solutions and the rollout of its DRIVE platform, which supports automated and autonomous vehicle technologies. Partnerships with companies like Mercedes-Benz and Jaguar Land Rover exemplify NVIDIA’s innovative approach to integrating software-defined technologies into modern vehicles, thus transforming the automotive sector into a technology-driven industry.

At least in this dataset, we can see Nvidia more often discusses autonomous vehicles, while Google discusses technologies for in-vehicle digital experience and for improving operations efficiency.

But let’s say we don’t want to use an LLM; we want structured data to do quantitative analysis! Easy – we can quantify how often each company mentioned this topic by quarter as follows:

 SELECT ticker,
        quarter,
        SUM(sum_topic_counts["Automotive AI and Simulation"])
          AS occurrences
 FROM doc_meta
 GROUP BY ticker, quarter

Or let’s say we want to perform a regression to see which topics correlate with stock price movements. You can fetch the data as follows, and analyze it to your heart’s content:

 SELECT ticker,
        quarter,
        (day_end_price - day_begin_price) / (day_begin_price)
          AS price_delta,
        sum_topic_counts
 FROM doc_meta

Sturdy Statistics opens up all of your data for comprehensive analysis. Let us help you uncover the insights hidden in your unstructured data!